Search CORE

5 research outputs found

Practical Issues of Building Robust HMM Models Using HTK and SPHINX Systems

Author: Gregor Rozinaj
Juraj Kacur
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Voice Operated Information System in Slovak

Author: Jarina Roman
Juhár Jozef
Rozinaj Gregor
Rusko Milan
Trnka Marián
Čižmár Anton
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2012
Field of study

Speech communication interfaces (SCI) are nowadays widely used in several domains. Automated spoken language human-computer interaction can replace human-human interaction if needed. Automatic speech recognition (ASR), a key technology of SCI, has been extensively studied during the past few decades. Most of present systems are based on statistical modeling, both at the acoustic and linguistic levels. Increased attention has been paid to speech recognition in adverse conditions recently, since noise-resistance has become one of the major bottlenecks for practical use of speech recognizers. Although many techniques have been developed, many challenges still have to be overcome before the ultimate goal -- creating machines capable of communicating with humans naturally -- can be achieved. In this paper we describe the research and development of the first Slovak spoken language dialogue system. The dialogue system is based on the DARPA Communicator architecture. The proposed system consists of the Galaxy hub and telephony, automatic speech recognition, text-to-speech, backend, transport and VoiceXML dialogue management modules. The SCI enables multi-user interaction in the Slovak language. Functionality of the SLDS is demonstrated and tested via two pilot applications, ``Weather forecast for Slovakia'' and ``Timetable of Slovak Railways''. The required information is retrieved from Internet resources in multi-user mode through PSTN, ISDN, GSM and/or VoIP network

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Comparison of Time & Frequency . . .

Author: Climent Nadeu
Dušan Macho
Gregor Rozinaj
Javier Hernando
Peter Jancovic
Publication venue
Publication date
Field of study

In current speech recognition systems, speech is represented by a 2-D sequence of parameters that model the temporal evolution of the spectral envelope of speech. Linear transformation or filtering along both time and frequency axes of that 2-D sequence are used to enhance the discriminative ability and robustness of speech parameters in the HMM pattern-matching formalism. In this paper, we compared two recently reported approaches which operate on the sequence of logarithmically compressed mel-scaled filter-bank energies: the first approach-TIFFING (TIme and Frequency FilterING)- applies FIR filters to that 2-D sequence along both axes, while the second one-CTM (Cepstral Time Matrix)- uses the DCT to compute a set of parameters in the 2-D transformed domain. They are compared in several ways: (1) analytically, using Fourier transformation, (2) statistically and (3) performing recognition tests with clean and noisy speech

CiteSeerX